Biostatistics For Dummies (Monika Wahi John Pezzullo)

known to have died during the observation period, and 0 if they did not die.

The treatment group variable. In this case, we created the variable Radiation, and coded it as 1

if the participant was in the radiation group, and 0 if they were in the chemotherapy group. That

way, the coefficient produced in the output will indicate the increase or decrease in proportional

hazard associated with being in the radiation group compared to the chemotherapy group.

The clinical center variable. In this case, we choose to create an indicator variable called

CenterCD, which is 1 if the participant is from Center C or Center D, and is 0 if they are from A or

B. Alternatively, you could choose to create one indicator variable for each center, as described in

Chapter 18.

If you use a numerical variable such as age as a predictor and enter it into the model, the

resulting coefficient will apply to increasing this variable by one unit (such as for one year of

age).

Using the R statistical software, the PH regression can be invoked with a single command:

coxph(formula = Surv(Time, Status) ~ CenterCD + Radiation)

Figure 23-4 shows R’s output, using the data that we graph in Figure 23-3. The output from other

statistical programs won’t look exactly like Figure 23-4, but you should be able to find the main

components described in the following sections.

FIGURE 23-4: Output of a PH regression from R.

Testing the validity of the assumptions

When you’re analyzing data using PH regression, you’re assuming that your data are consistent with the

idea of flexing a baseline survival curve by raising all the points in the entire curve to the same power

(shown as h in Figures 23-1b and 23-2b). You’re not allowed to twist the curve so that it goes higher

than the baseline curve (

) for small time values and lower than baseline (

) for large time

values. That would be a non-PH flexing of the curve.